-
Notifications
You must be signed in to change notification settings - Fork 2.5k
Upgrade aws java sdk to v2 #8441
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
yihua
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM since most of the changes update API usage only. Is there any compatibility issue with different Spark, Flink and Hadoop versions?
| <groupId>com.amazonaws</groupId> | ||
| <artifactId>dynamodb-lock-client</artifactId> | ||
| <version>${dynamodb.lockclient.version}</version> | ||
| <version>1.2.0</version> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: could you keep the version variable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As in we do not specify a version and let it pull in the latest version?
| return AmazonCloudWatchAsyncClientBuilder.standard() | ||
| .withCredentials(HoodieAWSCredentialsProviderFactory.getAwsCredentialsProvider(props)) | ||
| private static CloudWatchAsyncClient getAmazonCloudWatchClient(Properties props) { | ||
| return CloudWatchAsyncClient.builder() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does .standard() do before? Is it by default in the builder now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you look at the table for several of these aws clients the table shows a before column where the .standard was used in 1.x and now in 2.x it is replaced by using the .builder pattern. See this snippet taken from link above
Client builders no longer contain static methods. The static methods on the clients must be used: AmazonDynamoDBClientBuilder.defaultClient is now DynamoDbClient.create and AmazonDynamoDBClientBuilder.standard is now DynamoDbClient.builder.
hudi-aws/src/main/java/org/apache/hudi/aws/credentials/HoodieAWSCredentialsProviderFactory.java
Show resolved
Hide resolved
hudi-aws/src/main/java/org/apache/hudi/aws/credentials/HoodieAWSCredentialsProviderFactory.java
Show resolved
Hide resolved
| StorageDescriptor partitionSD = sd.copy(copySd -> copySd.columns(newColumns)); | ||
| final Instant now = Instant.now(); | ||
| TableInput updatedTableInput = TableInput.builder() | ||
| .tableType(table.tableType()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
table name is missed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for catching this, will fix this and check this class once more just to be safe for any misses like this
| .withDatabaseName(databaseName) | ||
| .withTableInput(updatedTableInput); | ||
| StorageDescriptor sd = table.storageDescriptor(); | ||
| StorageDescriptor partitionSD = sd.copy(copySd -> copySd.columns(newColumns)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here it makes a copy instead of in-place change. Any reason of doing this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In SDK v2 the POJOS are immutable so I cant modify the original StorageDescriptor sd object and the only option is to make a clone and pass that.
From docs https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/migration-whats-different.html
Immutable POJOs
Clients and operation request and response objects are now immutable and cannot be changed after creation. To reuse a request or response variable, you must build a new object to assign to it.
| * // ... start making calls to table ... | ||
| * </pre> | ||
| */ | ||
| public class DynamoTableUtils { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we have to copy the code from the library?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes unfortunately this is an issue.
Basically in hudi we were using the TableUtils class which was a src class in V1 that we would invoke in dynamodb locking feature, but now in v2 this class been moved as a test class see this commit. aws/aws-sdk-java-v2@973237c.
Because of this we can not access anymore this class anymore due to its test scope. I didnt see any replacement DynamoDB utility like classes in the SDK v2
So Currently im opted to resuse class over and left a comment to where this code can be found so we dont have to reinvent here. Let me know if you have any other ideas on how we can go about this.
|
|
||
| <dependency> | ||
| <groupId>org.apache.httpcomponents</groupId> | ||
| <artifactId>httpclient</artifactId> | ||
| <version>${aws.sdk.httpclient.version}</version> | ||
| </dependency> | ||
| <dependency> | ||
| <groupId>org.apache.httpcomponents</groupId> | ||
| <artifactId>httpcore</artifactId> | ||
| <version>${aws.sdk.httpcore.version}</version> | ||
| </dependency> | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hudi-aws module already adds this so there is no need to add both dependencies again?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense will remove these and run dependency tree to confirm.
| <!-- AWS Services --> | ||
| <!-- https://mvnrepository.com/artifact/software.amazon.awssdk/aws-java-sdk-sqs --> | ||
| <dependency> | ||
| <groupId>software.amazon.awssdk</groupId> | ||
| <artifactId>sqs</artifactId> | ||
| <version>${aws.sdk.version}</version> | ||
| </dependency> | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be in hudi-aws module? Why is it required now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that the events source classes inside the hudi utilities module will require this dependency.
https://github.com/apache/hudi/blob/master/hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/S3EventsSource.java
but i see what you mean we dont have to place this dependency inside the utiltiies pom and instead we place it in hudi aws pom https://github.com/apache/hudi/blob/master/hudi-aws/pom.xml#L144
| <include>org.apache.hudi:hudi-aws</include> | ||
| <include>software.amazon.awssdk:*</include> | ||
| <include>org.apache.httpcomponents:httpclient</include> | ||
| <include>org.apache.httpcomponents:httpcore</include> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should avoid this. If AWS ecosystem is used, hudi-aws-bundle should be used with hudi-utilities-bundle: https://hudi.apache.org/releases/release-0.12.2#bundle-updates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so we basically remove all of this from the utilities bundle, and then its mandatory that user will have to pass both utilities bundle and aws bundle?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we can remove this change here and user will pass bundle then.
So far I have only tested with I can try looking into doing some tests with flink but just curious what combinations of version testing I should do before we land this?( Hadoop, Spark, Flink) |
Synced offline already sometime before. We should test Spark 2.4, Spark 3.x with Hadoop 2 on S3 to make sure there is no compatibility issue, especially around S3A file system. |
|
Closing this in favor of #9347 . |
Change Logs
Describe context and summary for this change. Highlight if any code was copied.
Impact
Describe any public API or user-facing feature change or any performance impact.
Risk level (write none, low medium or high below)
If medium or high, explain what verification was done to mitigate the risks.
Documentation Update
Describe any necessary documentation update if there is any new feature, config, or user-facing change
ticket number here and follow the instruction to make
changes to the website.
Contributor's checklist